Goto

Collaborating Authors

 recurrence relation



Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

arXiv.org Machine Learning

We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers. The criticality picture known from residual networks carries over to transformers: the pre-LayerNorm architecture exhibits power-law APJN growth, whereas transformers with LayerNorm replaced by elementwise $\tanh$-like nonlinearities have stretched-exponential APJN growth, indicating that the latter are subcritical. Applied to Dynamic Tanh (DyT) and Dynamic erf (Derf) transformers, the theory explains why these architectures can be more sensitive to initialization and optimization choices and require careful tuning for stable training.






INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction

arXiv.org Artificial Intelligence

Implicit Neural Representations (INRs) have gained success in various signal processing tasks due to their advantages of continuity and infinite resolution. However, the factors influencing their effectiveness and limitations remain underexplored. To better understand these factors, we leverage insights from Neural Tangent Kernel (NTK) theory to analyze how model architectures (classic MLP and emerging KAN), positional encoding, and nonlinear primitives affect the response to signals of varying frequencies. Building on this analysis, we introduce INR-Bench, the first comprehensive benchmark specifically designed for multimodal INR tasks. It includes 56 variants of Coordinate-MLP models (featuring 4 types of positional encoding and 14 activation functions) and 22 Coordinate-KAN models with distinct basis functions, evaluated across 9 implicit multimodal tasks. These tasks cover both forward and inverse problems, offering a robust platform to highlight the strengths and limitations of different neural models, thereby establishing a solid foundation for future research. The code and dataset are available at https://github.com/lif314/INR-Bench.



Computing Linear Regions in Neural Networks with Skip Connections

arXiv.org Artificial Intelligence

A neural network is a composition of neurons, where each neuron can be represented as a nonlinear function depending on inputs and parameters, called weights and biases. The nonlinearity of the network can be understood via tropical geometry, in particular for networks with ReLU activation functions, which are piecewise linear. For such networks, we introduce an algorithm to compute all linear regions of a neural network. A linear region of a neural network is a connected region on which the map defined by the network is linear. Knowing those linear regions allows for quicker predictions, as demonstrated by our new caching algorithm. Our algorithms work for networks with skip connections. Skip connections add the output of previous layers to the input of later layers, skipping over the layers in between. The expository paper [2] offers promising avenues to study neural networks.